An LBS and agent-based simulator for Covid-19 research

The mobility data of citizens provide important information on the epidemic spread including Covid-19. However, the privacy versus security dilemma hinders the utilization of such data. This paper proposed a method to generate pseudo mobility data on a per-agent basis, utilizing the actual geographical environment data provided by LBS to generate the agent-specific mobility trajectories and export them as GPS-like data. Demographic characteristics such as behavior patterns, gender, age, vaccination, and mask-wearing status are also assigned to the agents. A web-based data generator was implemented, enabling users to make detailed settings to meet different research needs. The simulated data indicated the usability of the proposed methods.

www.nature.com/scientificreports/ plans. Research suggested that the mobility of a population could be divided into several movement patterns 23,28 . If these patterns was linearly combined, it was possible to simulate the main human mobility at the urban scale 16 . Our method aimed to generate human mobility data with high spatial and temporal resolution based on agent-specific activity schedules for Covid-19 simulation.
The main contributions of this paper are: • A method for generating pseudo human mobility data on a per-agent basis was proposed for Covid-19 simulation, by combining group pattern settings and agent-specific activity chain generation. • The information provided by LBS is used to generate agents' mobility data conforming to the real road network and the data was processed into GPS-like format with high spatial and temporal resolutions. • In addition to agent mobility data, information such as age, gender, vaccination and mask-wearing status was also provided for each generated trajectory for a purpose of better virus transmission simulation. • Based on the proposed method, a web-based generator was also implemented, which can be used by researchers to generate mobility data flexibly without any code-level changes.

Methods
Our proposed method could be roughly divided into three steps (see Fig. 1): • Activity-based group mobility pattern design.
• Data processing for GPS trajectories.
Activity-based group mobility pattern design. We defined the group pattern to determine the activities and attributes of each individual who belonged to the group, which contains a set of parameters as well as a schedule template. Since human mobility data consists of multiple individuals with different behavioral characteristics, it was challenging to distill individual-level statistical patterns from such aggregated data 29 . Instead, it had been well reported that we could use ABM to model human mobility data from the bottom-up by dividing agents into several groups with heterogeneous behavioral patterns 26,28 . Therefore, we set the following parameters for the group patterns (see Table 1). Among them, group name and group probability were parameters of the group itself. Virus infection related attributes could influence the transmission and development of Covid-19, including vaccination 30 , mask 31 , age 32 , and gender distribution 32,33 . Meanwhile, mobility related attribute, i.e., driving preference was also included as a group parameter that could influence an agent's travel choice as well as one's infection probability during the trip. A schedule template (see Fig. 2) was assigned to all agents who belong to this group. The schedule template was used to constrain the activities of all agents in the group, which was assumed to be known to the user. The time duration of the schedule template was one week and could be divided into two categories: working days www.nature.com/scientificreports/ and non-working days. By default, Monday to Friday were working days, and Saturday and Sunday were nonworking days. The time resolution for each day was one hour. The schedule template specified the activity for the agents during a certain time window, in a form of categories of target places and the maximum travel distance. Be notice that for each agent in the same group, even though they all followed the schedule template, the target places they went to could be different, which resulting in different arrival times and thus affecting their following mobility activities. Therefore, each agent had its agent-specific daily activity chains when LBS information was acquired (see detail in the following session). Considering the fact that the virus transmission was demographically heterogeneous (such as age and gender 23 ), e.g., Covid-19 was significantly more lethal for older people than other populations 34 , we also assigned epidemic correlated attributes in the group pattern. These attributes were vital for virus transmission simulation, but missing in traditional human mobility data such as mobile phone location data and GPS data. The attributes of agents are listed in Table 2, whose values were affected by the corresponding parameters in the group pattern.

LBS-based agent trajectory generation.
With the pre-set group pattern, our method generated agentspecific schedules for each agent in this step. We generated the trajectory dataset based on the schedule template on a per-agent basis in two steps (see Fig. 3): (1) determined a target place within the Place of Interest (POI) category as defined in the schedule template using POIs search service to build an agent-specific schedule. (2) generating trajectories according to the agent-specific schedule, with the route planning services. We send a request to Search POIs Service to get a list of POIs returned by the LBS provider based on the POI category and maximum distance specified in each agent's schedule template. Currently, we randomly select one of the eligible www.nature.com/scientificreports/ POIs and assign it to the schedule template. Repeating this step, we could generate an agent-specific schedule based on the schedule template but with different destinations. The agent-specific schedule now contains factual geographic information.
After generating the agent-specific schedules, we used the route planning service of the LBS provider to retrieve the corresponding data. By setting parameters such as origin and destination, travel mode, and estimated departure time, we retrieved the planned route that matches the road networks and accessibility. For travel mode, we considered driving as an independent mode with almost zero possibility for infection during the trip. Different driving preference rates were assigned to each group pattern. Meanwhile, the probability of driving was also affected by the travel distance. For each travel, the probability of driving was computed as following: where the probability function p(d) received the distance d between the origin and destination as an input. D was the distance threshold, which we currently set to 5000 m. p v was the driving preference parameter set in the group pattern. α had a value of 0.4, making the probability of driving close to 100% when p v equals 10. Other travel modes apart from driving were considered as modes with infection probabilities, such as walking, cycling, and public transportations, whose probability were determined in a similar way. Equation (1) is also used to determine the probability of traveling by public transportation, cycling, walking, etc. (with different D).
Although agents in the same group share the same schedule template on a week basis, when different POI and transportation choice was selected, agent-specific schedules were generated on an individual basis. Especially for route planning services, routes matching the local road conditions and the operation of public transportation could also be achieved. Therefore, the diversity of the generated data could be expected, which made the biggest differences between our method and earlier works. Compared to traditional sources of population mobility data,  Figure 3. Generating agent-specific schedule with LBS. According to the order of the schedule template, send a request to the LBS provider (which provides the search POIs service), match the POI information to the schedule template, and use the obtained POI information to continue to send a request to the POI provider (which provides the route planning service) to get the corresponding trajectory data. Repeat these steps to generate an agent-specific schedule after traversing the entire schedule template, which is no more extended POI categories but factual POI information. www.nature.com/scientificreports/ the data provided by using LBS is pre-processed, which eliminates the need for map matching 35 . Depended on the service area of the LBS provider, data in large geographical area (province, country level) could be generated accurately. In addition, LBS data was cloud-based, therefore the requirement of local computing resources was reduced, as we only need to perform simple processing and stitching operations on the received data.
Data processing for GPS trajectories. After the first two steps, we generated the trajectory data for each trip according to agent-specific schedules, however, these trajectories returned from the LBS provider were lowresolution and only as a route indicator when no turning required between any two adjacent points on the route (see Fig. 4b). In addition to traditional travel survey 36 , human mobility data currently used for Covid-19 research were multi-sourced and multi-faceted 29 , including trip records from the Automated Fare Collection (AFC) 28 , Call Detail Records (CDRs) from mobile network operators 14,37 , location data from the Global Positioning System (GPS) 15 and geotagged data from social media 38 , etc. Among them, travel survey was time-consuming and often require further extraction 37 . Geotagged data from social media were discontinuous and had the problem of incomplete population coverage 38 . GPS data were accurate enough and could be collected with mobile devices continuously in a daily basis, therefore, post-processing was performed to generate GPS trajectory data for each trip segments (POI to POI). For each segment, we obtain the polyline of the trajectory and the distance − → d of the segment, and the time spent t from the LBS. We could then calculate the average velocity − → v of the segment. After we set the time interval t for sampling, we could obtain the distance − → d t of the advance at the time interval t.
With the − → d t calculated, we could then compute the distance and resample each point in the original route in sequence (See Fig. 4a for a schematic of the process). Studies reported that GPS data with a data interval of 5 s reduces data storage requirements without compromising usability and accuracy 39 . Therefore, we set ∆t to 5 s. We could find that the resampled points were denser after the resampling process and could accurately reflect the segments' speed change (See Fig. 4b).
Web-based data generator. To facilitate the setting of the parameters of the group pattern as well as the schedule template, we implemented a web-based data generator with a Vue.js frontend and a Spring Boot backend. We chose Gaode Open Platform as our LBS provider because of its good usability in China. A convenient interface for setting was provided, and the usage details were shown in Supplementary Fig. S1a-c. We generated the dataset with the generator and export each agent's data as an excel file, including the agent's attributes and all the trajectories generated according to the assigned time range. Various types of information for each time spot www.nature.com/scientificreports/ were recorded, as shown in Supplementary Table S1. Our method could also provide information about places and individuals in addition to the GPS trajectory data, which was crucial for research on the spread of infectious diseases.

Results
We set six different group patterns using the data generator to validate our method, including general worker, overtime worker, freelancer, middle school student, pupil, and retiree. The detailed parameter settings was shown in Supplementary Table S2. Ordinary worker had the most enormous group rate of 40%, followed by retiree at 25%. The parameters such as group rate and vaccination rate were set according to the census data of the city. Except for retiree and freelancer, the schedule templates for the rest of the groups distinguish between working days and non-working days. We set these groups to travel more intensively on working days, including pupils and middle school students. One of the reasons we did not consider setting the college student group is that not all cities have college students, and college students tend to be confined to the interior of the campus and have more specific behavior patterns 40 . Nonpharmaceutical interventions (NPIs) had been employed worldwide to stop the rapid spread of Covid-19. NPIs were grouped into 3 major categories: school closure; cancellation of public gatherings; and isolation and quarantine 41 . We set four levels of restriction policies from rank0 to rank3 in the generator. The settings for these policies are shown in Supplementary Table S3. We apply rank0 on day1 to 6, while apply rank1, rank2 and rank3 on day7, day8, and day9, respectively.
We used the generator to generate 10,000 agents. The trajectories for each agent were generated with a time range of 9 days in an urban district of Hangzhou. The start time of the trajectory dataset was 00:00 on March 1, 2022, and the end time was 23:59 on March 9, 2022. The policy settings were as in Supplementary Fig. S2. As a result, 3942 ordinary worker trajectories, 1045 overtime worker trajectories, 2509 retiree trajectories, 910 middle school student trajectories, 575 pupil trajectories, and 1019 freelancers trajectories were generated. In all the generated agents, the male to female ratio was 100.00:99.40, and the average age was 42.36. The age distribution of all the agents was shown in Supplementary Fig. S3. A heat map based on the location of the starting point of the generated trajectory is given in Fig. 5a. The generated results were compared with the high accuracy population distribution published by WorldPop 42 (https:// hub. world pop. org/ geoda ta/ summa ry? id= 49730) as shown in Fig. 5b, where the boundary of Gongshu District was indicated with the black dash line. As seen, the generated population shows the same pattern as the published data, which was dense in the southeast and sparse in the north. Figure 6 shows the generated trajectories of two ordinary workers on day1. Even though the two agents shared the same group pattern (with the same schedule template), the generated trajectories were different because each agent-specific schedule was unique. We noted that the workplace in Fig. 6a was farther from home than in Fig. 6b, so the agent in Fig. 6a used public transportation to travel to the workplace and spent 48 min on the trip. In contrast, Fig. 6b chose to ride to the workplace, which took only 1 min. We also noted that even though they arrived the bus station at different time (20:59 vs. 20:54, respectively), the agents in Fig. 6a and b depart by bus at the same time, which indicates that our method was able to simulate the shifts in public transportation while identify the agents travel in the same bus. www.nature.com/scientificreports/

Discussion
The radius of gyration can be used to measure the spatial distribution of human movement 43 . In general, a larger radius of gyration for an individual represents a wider range of activity. For each agent generated, the radius of gyration r g during the period g is computed as follows: where n is the number of places that stay in period g . a c is the location of the center of mass calculated from all places. |a i − a c | is the distance from the center of mass to each passing site. As the results, the curve in Fig. 7b was slightly right-shifted compared to Fig. 7a, which indicates that the activity range on non-working days is more abundant. This may because people go to farther places during holidays rather than the usually nearby workplaces. Notice that most of the group' radius of gyration is concentrated around 3 ~ 4 km and mostly less than 8 km, which were small compared to other similar studies 44 . The possible reason for this may because that we limited the trip area to Gongshu District in Hangzhou only, which made all the activities outside of the district were impossible. We could find in Fig. 7c that the pupil group had the steepest curve, indicating that the travel patterns of this group were highly similar. Middle school student, retiree, ordinary worker, and overtime worker had smooth curves, indicating that the heterogeneity of movement patterns within these four groups was greater compared to pupil group. The curves for Freelancer were the flattest, which means that their activities were the most irregular. This was consistent with the actual attributes for freelancers as more significant heterogeneity of group patterns. Figure 8a showed the relationship between travel distance with travel speed and the number of travels for all trajectories during day1 and day9. We noted that the number of travels is negatively correlated with the distance, which meant that most agent travels in a small-scale range, which was consistent with the real situation. The travel speed was positively proportional to the distance, which meant that our method enables the agent to choose the travel tool reasonably according to travel distance. We can estimate from Fig. 8a that most agents choose to walk for trips within 3 km, and as the distance increases, the agents choose to ride, drive, or by public transportation more often. In addition, the majority of the trips were within 10 km, and long trips over 25 km are rare. Figure 8b showed the heat map of different groups of people at different hours of the day. For most of the agents, the number of travel behaviors from 0 to 5 h is almost 0. It could also be found that the travel peaks of different populations differ. Figure 9a showed the number of people in different POI categories on day4 (working day) and day5 (nonworking day). In the early morning of day4, almost all the agents were resting at home. After about 6:00 am, agents left home and the number of people at home dropped rapidly, while the number of people in other places gradually increased. Since day4 was a working day, most agents go to work, so the rise in the workplace was the most rapid. We noticed that the peak for dining places was in the morning, noon and evening, regardless of working days or non-working days. Compared to working days, the number of people at consumption places and (4) r g = 1 n n i=1 |a i − a c | 2 Figure 6. The trajectories of two agents (a, b) of ordinary worker group pattern on day1. (a, b) are the trajectories of two agents belonging to the ordinary worker we selected day1. The main reason for choosing these two agents is that they both use public transportation. We labeled the means of transportation they were passing and the time they stayed on the premises. www.nature.com/scientificreports/ entertainment places on non-working days rise sharply, while the number of people at workplaces decreased substantially. The above-observed information was consistent with our setting as well as people's common patterns. Figure 9b showed the effect of different levels of the NPIs on travels from day4 to day9. We noted that agents' activities showed a pattern of two peaks regardless of working and non-working days, i.e., in the morning from about 7 am when many agents depart, and in the evening again around 5 pm. For day7, when rank1 was implemented, there was no significant change from the curve, and only a small number of trips were restricted. For day8, when rank2 was implemented, dining was no longer available, as were recreational, public, and educational activities, and the probability of traveling to work was reduced by 30%. We could note the decrease in student trips in the curve and the decrease in evening peak trips. It was worth noting that there was still a peak in the evening of the day8, which is unrealistic because our current policy simulation did not change the schedule itself. Most trajectories return home in the evening. On day9, when rank3 was applied, almost no one goes out, and only some retirees who were allowed to go to the hospital as set in the schedule. The impact of the policy intervention was especially obvious for pupils and middle school students because the schools were closed. As a result, our method could simulate the responses of different groups of people to the same policy.
In our current results, only 10,000 agents were simulated within Gongshu District, Hangzhou, which was relatively smaller amount comparing to the actual population in the area. To evaluate the scale-up potential of our method, several experiments were performed. For the scale-up of number of generated, we recorded the consuming time for different number of generated, see Supplementary Fig. S4. As a result, the time-consuming increases as the number of agents grew when sequential process was performed for each agent. However, high proportion of the total time consumption (around 97%) was on the response time from the LBS provider. If asynchronous requests are used, there will be an order of magnitude improvement in the generation speed. On the one hand, because LBS-based method provides access to geographical data in various levels, our simulation can be scaled  Supplementary Table S4). Such results indicate that the time consumption did not increase linearly as the distance increases. Therefore, when the geographic scope is scaled up, its effect on the generation time is acceptable. Moreover, even at larger scales than cities, the majority of trips are concentrated within 100 km. In summary, for both needs for an increasement on agent quantitative as well as geographic scope, our approach had the potential to scale-up to the full-scale. Given the problem of service coverage of LBS providers (a single LBS provider may only provide services for part of the countries), it may also be necessary to integrate data from multiple LBS providers if the population at the state-level is to be generated.  The main reason we chose day4 and day5 is that they are Friday and Saturday respectively, making it easy to compare. (b) We consider an agent leaving a place as a departure record and entering a place as an arriving record. We select day4 (working day), day5 and day6 (nonworking days) as well as day7 (policy with rank1 implemented), day8 (policy with rank2 implemented), and day9 (policy with rank3 implemented) for the sake of comparison. www.nature.com/scientificreports/ In this paper, we proposed a method for generating pseudo-human mobility datasets using location-based services (LBS), which had advantages over other studies in similar fields. For example, Ban 45 et al. implemented an agent-based simulator for Covid-19, in which different activity schedules were assigned for three age categories (children, adult, and retired people).Therefore, there were only limited number of daily patterns of activity chains for agents, Liao 46 et al. proposed a simulation method which also used predefined activity patterns for Covid-19 transmission simulation, and the activity location of the same type was selected randomly, which was similar to the strategy utilized in our work. However, their simulation was restrained to on-campus locations, and the daily activity pattern covers only weekdays. In Kumar 47 et al., daily activity schedules were simulated for each individual including activities of home, work, education, shopping or other. Trip trajectories between each two consequent activities were generated according to road and transit networks, similar as ours. However, their simulation was at a 5-min resolution, while ours was in each 5-s. In conclusion, our method was able to generate pseudo population mobility data with high temporal and spatial resolution using only predefined population patterns and data from LBS, which was very helpful for studies related to infectious disease modeling. To the best of our knowledge, this is the first attempt to used LBS to generate pseudo population mobility data.

Conclusion
In this paper, we proposed a method to generate pseudo mobility data on a per-agent basis. To validate our method, we generated 10,000 agents' travel records with a starting point in a district of Hangzhou. The mobility data was analyzed with the distribution of the starting point, the radius of gyration, the travel situation, and the number of people in the premises, and it was concluded that the generated data is in accord with the urbanscale group patterns. Our method could generate not only agent-based mobility data, but also Covid-19-related attributes of the agents, therefore could be used for ABM-based epidemic simulation. In addition, We have implemented a web-based trajectory generator. This allowed the user to configure various parameters to meet the needs of different researches without any code-level changes. However, there were limitations in our current work. First, the agents generated were lightweight in numbers and are restricted at city district level. However, we discuss the potential of our method to scale up in both geographic and agent quantity aspects. Second, the focus of our current work was on the feasibility of generating agent-based daily mobility data by using LBS to acquire geographic and road map information. Although our method included a component of synthetic population generation, it was based on rough calculation. In the future, techniques including Synthetic Reconstruction (SR) or the Combinatorial Optimization (CO) will be adapted. Besides, instead of random selection from POIs returned by the LBS, how to select the agent-specific POI in a more reasonable way will be explored in our future work.

Data availability
The population distribution dataset that we used to compare with our results is publicly available from WorldPop at https:// hub. world pop. org/ geoda ta/ summa ry? id= 49730. All other data included in this study are available upon request by contact with the corresponding author. The dataset we generated is available athttps:// drive. google. com/ drive/ folde rs/ 1a4KC yat7M dP6Ce JvcV1 CwvTm YL1g-woC? usp= shari ng.